forward process
Rethinking Forward Processes for Score-Based Data Assimilation in High Dimensions
Yoon, Eunbi, Kim, Donghan, Kim, Dae Wook
Data assimilation is the process of estimating the time-evolving state of a dynamical system by integrating model predictions and noisy observations. It is commonly formulated as Bayesian filtering, but classical filters often struggle with accuracy or computational feasibility in high dimensions. Recently, score-based generative models have emerged as a scalable approach for high-dimensional data assimilation, enabling accurate modeling and sampling of complex distributions. However, existing score-based filters often specify the forward process independently of the data assimilation. As a result, the measurement-update step depends on heuristic approximations of the likelihood score, which can accumulate errors and degrade performance over time. Here, we propose a measurement-aware score-based filter (MASF) that defines a measurement-aware forward process directly from the measurement equation. This construction makes the likelihood score analytically tractable: for linear measurements, we derive the exact likelihood score and combine it with a learned prior score to obtain the posterior score. Numerical experiments covering a range of settings, including high-dimensional datasets, demonstrate improved accuracy and stability over existing score-based filters.
Sharp Convergence Rates for Masked Diffusion Models
Liang, Yuchen, Tan, Zhiheng, Shroff, Ness, Liang, Yingbin
Discrete diffusion models have achieved strong empirical performance in text and other symbolic domains, with masked (absorbing-rate) variants emerging as competitive alternatives to autoregressive models. Among existing samplers, the Euler method remains the standard choice in many applications, and more recently, the First-Hitting Sampler (FHS) has shown considerable promise for masked diffusion models. Despite their practical success, the theoretical understanding of these samplers remains limited. Existing analyses are conducted in Kullback-Leibler (KL) divergence, which often yields loose parameter dependencies and requires strong assumptions on score estimation. Moreover, these guarantees do not cover recently developed high-performance sampler of FHS. In this work, we first develop a direct total-variation (TV) based analysis for the Euler method that overcomes these limitations. Our results relax assumptions on score estimation, improve parameter dependencies, and establish convergence guarantees without requiring any surrogate initialization. Also for this setting, we provide the first convergence lower bound for the Euler sampler, establishing tightness with respect to both the data dimension $d$ and the target accuracy $\varepsilon$. Finally, we analyze the FHS sampler and show that it incurs no sampling error beyond that induced by score estimation, which we show to be tight with a matching lower error bound. Overall, our analysis introduces a direct TV-based error decomposition along the CTMC trajectory and a decoupling-based path-wise analysis for FHS, which may be of independent interest.
- North America > United States > Ohio (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Oceania > New Zealand (0.04)
- Asia > India (0.04)
- Asia > Bangladesh (0.04)
- (16 more...)
- Government (0.68)
- Leisure & Entertainment (0.45)
- Health & Medicine (0.45)
- Workflow (0.46)
- Research Report (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.96)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- Africa > Rwanda > Kigali > Kigali (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Singapore (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Artificial Intelligence > Vision (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- North America > United States (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- (2 more...)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Research Report > Experimental Study (0.93)
- Overview (0.92)
- Information Technology (0.46)
- Health & Medicine (0.45)